{ "cells": [ { "cell_type": "markdown", "id": "134ed46e-e9ed-4fa7-b10f-5f6aff495b13", "metadata": {}, "source": [ "# Inspecting Model Architecture\n", "\n", "## Objective\n", "\n", "The objective of this tutorial is to provide various methods to examine the QMzymeModel throughout the workflow. \n", "\n", "This workflow allows you to:\n", "\n", "- Examine QMzymeModel and QMzymeResidue in various ways.\n", "\n", "In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB [1OH0](https://doi.org/10.2210/pdb1OH0/pdb) and MM-minimized prior to this tutorial.\n", "\n", "## Classes used in this example\n", "\n", "- [Generate Model](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.GenerateModel.html)\n", "- [QM_method](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.CalculateModel.html#qm-treatment)\n", "- [SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#)\n", "- [DistanceCutoff SelectionSchemes](https://qmzyme.readthedocs.io/en/latest/API/QMzyme.SelectionSchemes.html#QMzyme.SelectionSchemes.DistanceCutoff)\n", "\n", "## Required Files\n", "\n", "To start, you will need:\n", "\n", "- A fully prepped and protonated PDB of the reference protein file with the ligand bound (if applicable)\n", "\n", "---" ] }, { "cell_type": "code", "execution_count": null, "id": "e2f5fa6b-8b88-4d44-b3e6-2e3d4690bad3", "metadata": {}, "outputs": [], "source": [ "import QMzyme\n", "from QMzyme.SelectionSchemes import DistanceCutoff\n", "from QMzyme.data import PDB\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "98997308-5937-4124-9224-a6c82c002986", "metadata": {}, "source": [ "Before starting, here is the model system we will be using for this workflow." ] }, { "cell_type": "code", "execution_count": 8, "id": "0bfb062b-95ee-4dea-9ecf-4600ae6f6546", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.\n", "QMzymeRegion cutoff_3 has an estimated charge of -2.\n", "\n", "Truncated model has been created and saved to attribute 'truncated' and stored in QMzyme.CalculateModel.calculation under key QM. This model will be used to write the calculation input.\n" ] } ], "source": [ "model = QMzyme.GenerateModel(PDB)\n", "QMzyme.data.residue_charges.update({'EQU': -1})\n", "qm_method = QMzyme.QM_Method(\n", " basis_set='6-31G*', \n", " functional='wB97XD', \n", " qm_input='OPT FREQ', \n", " program='gaussian'\n", ")\n", "model.set_catalytic_center(selection='resid 263')\n", "model.set_region(selection='all', name='full_protein')\n", "model.set_region(selection=DistanceCutoff, cutoff=3)\n", "c_alpha_atoms = model.cutoff_3.get_atoms(attribute='name', value='CA')\n", "model.cutoff_3.set_fixed_atoms(atoms=c_alpha_atoms)\n", "qm_method.assign_to_region(region=model.cutoff_3)\n", "model.truncate()" ] }, { "cell_type": "markdown", "id": "1446162f-254f-413a-9582-4868ca105f2e", "metadata": {}, "source": [ "## Using Print Statements\n", "\n", "One of the simplest ways to visualize the setup of your model is by directly printing its attributes. This provides a raw, immediate look at the system." ] }, { "cell_type": "code", "execution_count": 3, "id": "998d29dc-27e5-4ec8-8a18-0a88d5092f81", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[, , , ]\n", "[, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ]\n" ] } ], "source": [ "# This outputs all QMzymeRegions currently registered in your QMzymeModel.\n", "print(model.regions)\n", "\n", "# This outputs all QMzymeAtoms currently registered in your QMzymeRegion.\n", "print(model.catalytic_center.atoms)" ] }, { "cell_type": "markdown", "id": "617a7fcc-7e40-4fce-a071-d8491c2eb935", "metadata": {}, "source": [ "## Using region.summarize with Pandas Dataframe\n", "\n", "Next, we will examine the region using pandas and `summarize()` method. When using `summarize()` directly, it returns a list of attributes assigned to the specific region." ] }, { "cell_type": "code", "execution_count": 4, "id": "a05acaef-b6d8-4067-a0b5-631485c1e547", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'Resid': [np.int64(263)],\n", " 'Resname': ['EQU'],\n", " 'Charge': [-1],\n", " 'Removed atoms': [[]],\n", " 'Fixed atoms': [[]],\n", " 'Segids': ['A']}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.catalytic_center.summarize()" ] }, { "cell_type": "markdown", "id": "19da8105-76ac-41e0-a6cc-38ec3fc11616", "metadata": {}, "source": [ "As region size gets larger, the list also gets larger. To create more approachable data sets, we can use a Python package `Pandas`, to transfer the list into a table!" ] }, { "cell_type": "code", "execution_count": 5, "id": "42b64c82-3107-429e-abb9-aa8e843cc620", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ResidResnameChargeRemoved atomsFixed atomsSegids
0263EQU-1[][]A
\n", "
" ], "text/plain": [ " Resid Resname Charge Removed atoms Fixed atoms Segids\n", "0 263 EQU -1 [] [] A" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(model.catalytic_center.summarize())\n", "df" ] }, { "cell_type": "markdown", "id": "c11e0914-13b8-41c1-975f-af7a9651c8c1", "metadata": {}, "source": [ "This is especially useful when looking at a truncated QMzymeRegion to examine their designated attributes and conditions." ] }, { "cell_type": "code", "execution_count": 6, "id": "60f2b4db-fdb0-4f5d-b390-91c3487fcc64", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ResidResnameChargeRemoved atomsFixed atomsSegids
016TYR0[N, H, C, O][CA]QM
120VAL0[N, H, C, O][CA]QM
240ASP-1[N, H, C, O][CA]QM
360GLY0[N, H][CA]QM
461LEU0[C, O][CA]QM
566VAL0[N, H, C, O][CA]QM
686PHE0[N, H, C, O][CA]QM
788VAL0[N, H, C, O][CA]QM
890MET0[N, H, C, O][CA]QM
999LEU0[N, H, C, O][CA]QM
10101VAL0[N, H, C, O][CA]QM
11103ASH0[N, H, C, O][CA]QM
12118ALA0[N, H, C, O][CA]QM
13120TRP0[N, H, C, O][CA]QM
14263EQU-1[][]QM
15372WAT0[][]QM
16373WAT0[][]QM
17376WAT0[][]QM
18378WAT0[][]QM
\n", "
" ], "text/plain": [ " Resid Resname Charge Removed atoms Fixed atoms Segids\n", "0 16 TYR 0 [N, H, C, O] [CA] QM\n", "1 20 VAL 0 [N, H, C, O] [CA] QM\n", "2 40 ASP -1 [N, H, C, O] [CA] QM\n", "3 60 GLY 0 [N, H] [CA] QM\n", "4 61 LEU 0 [C, O] [CA] QM\n", "5 66 VAL 0 [N, H, C, O] [CA] QM\n", "6 86 PHE 0 [N, H, C, O] [CA] QM\n", "7 88 VAL 0 [N, H, C, O] [CA] QM\n", "8 90 MET 0 [N, H, C, O] [CA] QM\n", "9 99 LEU 0 [N, H, C, O] [CA] QM\n", "10 101 VAL 0 [N, H, C, O] [CA] QM\n", "11 103 ASH 0 [N, H, C, O] [CA] QM\n", "12 118 ALA 0 [N, H, C, O] [CA] QM\n", "13 120 TRP 0 [N, H, C, O] [CA] QM\n", "14 263 EQU -1 [] [] QM\n", "15 372 WAT 0 [] [] QM\n", "16 373 WAT 0 [] [] QM\n", "17 376 WAT 0 [] [] QM\n", "18 378 WAT 0 [] [] QM" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(model.cutoff_3_truncated.summarize())\n", "df" ] }, { "cell_type": "markdown", "id": "32303981-63ff-4523-86e3-0dda68c27ec3", "metadata": {}, "source": [ "## Using print_overview()\n", "\n", "To get a more general overview of the system, you can use `print_overview()` method to examine both QMzymeModel and QMzymeRegion. This method acts as a diagnostic report for your entire workflow." ] }, { "cell_type": "code", "execution_count": 7, "id": "b3222228-33b2-410e-826e-36090ac76e77", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-----------------------------\n", "Model Overview: 1oh0 \n", "-----------------------------\n", " - total atoms: 4258\n", " - total residues: 324\n", " - total regions: 4\n", "-----------------------------\n", "Region Overview\n", "-----------------------------\n", "Region Name: catalytic_center\n", " - atoms: 37\n", " - residues: 1\n", " - method: None\n", " - selection_scheme: resid 263\n", "-----------------------------\n", "Region Name: full_protein\n", " - atoms: 4258\n", " - residues: 324\n", " - method: None\n", " - selection_scheme: all\n", "-----------------------------\n", "Region Name: cutoff_3\n", " - atoms: 275\n", " - residues: 19\n", " - method: {'type': 'QM', 'qm_input': '6-31G* wB97XD OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97XD', 'qm_end': '', 'program': 'gaussian', 'freeze_atoms': [2, 23, 39, 51, 58, 77, 93, 113, 129, 146, 165, 181, 194, 204], 'mult': 1, 'charge': -2}\n", " - selection_scheme: DistanceCutoff\n", " - cutoff: 3\n", "-----------------------------\n", "Region Name: cutoff_3_truncated\n", " - atoms: 249\n", " - residues: 19\n", " - method: {'type': 'QM', 'qm_input': '6-31G* wB97XD OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97XD', 'qm_end': '', 'program': 'gaussian', 'freeze_atoms': [2, 23, 39, 51, 58, 77, 93, 113, 129, 146, 165, 181, 194, 204], 'mult': 1, 'charge': -2}\n", " - selection_scheme: truncated from cutoff_3\n", "-----------------------------\n" ] } ], "source": [ "model.print_overview()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 5 }